Motivation

As NBA fans, we are witnessing probably the greatest revolution ever on the court, which is called “Small ball era”. Teams tend to use small and faster players instead of traditional giants to accelerate moving speed and improve shooting efficiency.

Last year, our favorite team Knicks returned to playoff season after eight years(link), which brought great joy to the fans. In order to make this performance long-lasting, we feel obligated to research the key variables that contribute to the winning so that help Knicks maintain existing strengths and make up for the shortages.

The most obvious feature of the “small ball era” is the lifting of speed and the rise of 3 point shot attempt. In this way, we focus on three-point related variables along with other factors to conduct our analysis.

Questions

The condition for playoff is ranking top eight among the fifteen teams in either east or west conference, and the rank of teams is determined by the number of games that a team win in a regular season. In this way, we are going to explore how the number of games won by a team is associated with average performance factors and give suggestions on the improvement of these abilities.

  1. What is the threshold number of wins to enter playoff?
    At the very beginning, we are interested in exploring the distribution of the number of games won by NBA playoff teams to find the threshold number of wins to enter into the playoff.

  2. Which variables contribute to the total wins of a season and a single game winning?
    Then We want to figure out the contributors to total wins of a season and single win of a game. With the assumption that total wins are independent between each team and each year, and the result of a single game is independent of another, we used linear regression model to fit total wins with average performance predictors.

  3. Will Knicks get into play-off according to the current model prediction?
    In addition, we would use our model and the new data to predict the number of wins of Knicks in this 2021-22 season, and to see whether it can get into play-off season.

  4. What is the difference in average performance between Knicks and top teams?
    Then according to the regression results, we want to analyse the performance of Knicks on key predictors to see the advantages and shortages of Knicks.

  5. which specific players cause these shortages and how to improve the overall performance?
    Finally, we deep dived these gaps from team level into player level and found how leading players should improve to get more wins. A detailed game and training strategy is proposed.

Data

Data Source

As our project needs detailed stats about NBA teams and players from last 10 seasons NBA regular season, we used scrapping to get official advanced data from NBA Stats. There are four datasets we mainly used:

  1. Advanced Box Score: In this data set, each observation represent a game and the specific data in this game, which contains the score, total field goal attempt, three point made and so on.

  2. Playtype by Team: this data set contains average data for each team of a season in the aspect of offensive play type, such as isolation, pick and roll, ect. Each observation represent the team average data in a regular season with respect to a specific play type.

  3. Tracking: this data set contains detailed information about NBA teams’ average movement data in a regular season, for example, passing, touches.

  4. Knicks Shooting Log: in this data set, each observation represent a field goal that player in Knicks made, including the player who made the shot, the location they shot, the time remaining when the shot was made.

As I mentioned above, these data sets were scrapped from NBA website, the code to scrap data can be found at scrapping data

Data Wrangling

We met with a huge problem at the very beginning that the most detailed NBA stats website do not have API for scrapping. NBA Stats. So we learned from a blogger to write a function to extract data using web devtools.(link)

scrapping_data = function(url) {
  headers = headers = c(
  `Connection` = 'keep-alive',
  `Accept` = 'application/json, text/plain, */*',
  `x-nba-stats-token` = 'true',
  `X-NewRelic-ID` = 'VQECWF5UChAHUlNTBwgBVw==',
  `User-Agent` = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36', 
  `x-nba-stats-origin` = 'stats',
  `Sec-Fetch-Site` = 'same-origin',
  `Sec-Fetch-Mode` = 'cors',
  `Referer` = 'https://stats.nba.com/players/leaguedashplayerbiostats/',
  `Accept-Encoding` = 'gzip, deflate, br',
  `Accept-Language` = 'en-US,en;q=0.9')
  response = GET(url, add_headers(headers))
  data = fromJSON(content(response, as = "text"))
  df = data.frame(data$resultSets$rowSet[[1]], stringAsFactors = FALSE)
  names(df) = tolower(data$resultSets$headers[[1]])
  return(df)
}

drop_last_column = function(df) {
  df = df %>% select(-names(df)[[length(names(df))]])
  return(df)}

Then, extract the dataset and variables we want and save all of them to local for further use. Here is one example dataset, box_score_all.

season_years = c("2020-21", "2019-20", 
           "2018-19", "2017-18", 
           "2016-17", "2015-16", 
           "2014-15", "2013-14", 
           "2012-13", "2011-12", 
           "2010-11", "2009-10", 
           "2008-09", "2007-08", 
           "2006-07", "2005-06", 
           "2004-05", "2003-04", 
           "2002-03", "2001-02")

box_score_all = tibble(
  season_year = season_years,
  url = str_c("https://stats.nba.com/stats/teamgamelogs?DateFrom=&DateTo=&GameSegment=&LastNGames=0&LeagueID=00&Location=&MeasureType=Base&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PaceAdjust=N&PerMode=Totals&Period=0&PlusMinus=N&Rank=N&Season=", season_year, "&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&VsConference=&VsDivision="),
  box_score = map(url, scrapping_data)) %>% 
  mutate(box_score = map(box_score, drop_last_column)) %>% # last column of each box score is NA
  select(-season_year, -url) %>% 
  unnest(cols = box_score)

write_csv(box_score_all, "./data2/box_score_all.csv")

We just need to read them and select related variables everytime we used them.

box_score_all = read_csv("./data2/box_score_all.csv") %>% 
  janitor::clean_names() %>% 
  select(-contains("rank"))

pass_df = 
  read_csv("./data2/pass_df.csv") %>% 
  select(season_year, team_abbreviation, passes_made)

isol_df = 
  read_csv("./data2/isol_df.csv") %>% 
  select(season_year, team_abbreviation, poss) %>% 
  rename(poss_iso = poss)

prbh_df = 
  read_csv("./data2/prbh_df.csv") %>% 
  select(season_year, team_abbreviation, poss) %>% 
  rename(poss_prb = poss)

prrm_df = 
  read_csv("./data2/prrm_df.csv") %>% 
  select(season_year, team_abbreviation, poss) %>% 
  rename(poss_prr = poss)

defend_df = 
  read_csv("./data2/defensive_impact_df.csv") %>% 
  select(season_year, team_abbreviation, stl, blk, dreb)

trans_df = 
  read_csv("./data2/transition_df.csv") %>% 
  select(season_year, team_abbreviation, poss) %>% 
  rename(poss_trans = poss)

As these are raw datasets documented by game, we need to do some summarizing, variable adding and deletion and so on. Detailed wrangling process can be seen here.

Average Data Frame

The avg_df contains the number of wins by team and year in the last 20 years, which is used to analyse the distribution of threshold wins, and to further generate predict_df for regression.

avg_df = 
  box_score_all %>% 
  select(season_year, team_abbreviation, wl, pts, ast, tov, fgm, fga, fg3m, fg3a) %>%
  mutate(
    win = case_when(wl == "W" ~ 1, TRUE~0),
    game_num = 1,
    fg3a_p = round(fg3a/fga, digits = 3),
    team_abbreviation = str_replace(team_abbreviation, "NOH", "NOP"), 
    team_abbreviation = str_replace(team_abbreviation, "NJN", "BKN"),
    conference = case_when(
      team_abbreviation %in% c("UTA","PHX","LAC","DEN","DAL","LAL","POR","GSW","SAS","MEM","NOP","SAC","MIN","OKC","HOU","SEA","NOK","CHH")~"west",
      team_abbreviation %in% c("PHI","BKN","MIL","ATL","NYK","MIA","BOS","IND","WAS","CHI","TOR","CLE","ORL","DET","CHA")~"east") # divide into east and west conference
    ) %>% 
  group_by(season_year, team_abbreviation, conference) %>% 
  summarise(
    wins = sum(win), 
    games = sum(game_num), 
    games_should = 82, 
    pts_avg = round(mean(pts), digits = 1), 
    ast_avg = round(mean(ast), digits = 1),
    tov_avg = round(mean(tov), digits = 1),
    fgm_total = sum(fgm), 
    fga_total = sum(fga), 
    fg3m_total = sum(fg3m), 
    fg3a_total = sum(fg3a)
    ) %>% 
  mutate(wins_revised = round(wins/games*games_should,0)) %>% # due to labor negotiation in 2011-12, COVID-19.
  relocate(season_year, team_abbreviation, conference, wins, wins_revised, everything()) %>% 
  arrange(desc(season_year),desc(wins)) %>% 
  mutate(fg3_p = fg3a_total/fga_total, fg3_r = fg3m_total/fg3a_total) %>% 
  group_by(season_year,conference) %>% 
  mutate(
    conf_rank = row_number(),
    play_off_team = case_when(
           conf_rank <= 8 ~ "playoff", 
           conf_rank > 8 ~ "non-playoff"
         ), 
         play_off_team = fct_relevel(play_off_team, c("playoff", "non-playoff")))

Predict Data Frame

The predict_df contain the average performance data with total number of games won in the last 8 years, which is used to build models and predict the number of winnings. Not only do we include the fundamental average stats like points, steals, blocks and turnovers from avg_df, but also we combined the avg_df with 6 other dataframes to include playtype data and defensive data.

predict_df = 
  avg_df %>%
  left_join(defend_df, by = c("season_year","team_abbreviation")) %>% 
  left_join(prrm_df, by = c("season_year","team_abbreviation")) %>% 
  left_join(prbh_df, by = c("season_year","team_abbreviation")) %>%
  left_join(isol_df, by = c("season_year","team_abbreviation")) %>% 
  left_join(pass_df, by = c("season_year","team_abbreviation")) %>%
  left_join(trans_df, by = c("season_year","team_abbreviation")) %>% 
  drop_na(poss_trans, passes_made, poss_iso, poss_prb, poss_prr, stl, blk, dreb) %>% 
  mutate(
    poss_pr = poss_prr + poss_prb
  ) %>% 
  select(-poss_prr, -poss_prb, -wins, -games, -games_should, -fgm_total, -fga_total)

Box Score Data Frame for Visualization

This dataframe contains 23476 observations in the last 8 years which is mainly used for analyzing the tendency of different stats between playoff teams and non-playoff teams.

box_score_viz = 
  box_score_all %>% 
  filter(season_year %in% c("2011-12", "2012-13", "2013-14", "2014-15", "2015-16", "2016-17", "2017-18", "2018-19", "2019-20", "2020-21")) %>% 
  mutate(team_abbreviation = str_replace(team_abbreviation, "NOH", "NOP"), 
    team_abbreviation = str_replace(team_abbreviation, "NJN", "BKN")) %>% 
  select(season_year, team_abbreviation, wl, pts, ast, tov, fgm, fga, fg3m, fg3a, stl, blk, dreb) %>%
  mutate(
    win = case_when(wl == "W" ~ 1, TRUE~0),
    game_num = 1, 
    conference = case_when(
      team_abbreviation %in% c("UTA","PHX","LAC","DEN","DAL","LAL","POR","GSW","SAS","MEM","NOP","SAC","MIN","OKC","HOU","SEA","NOK","CHH")~"west",
      team_abbreviation %in% c("PHI","BKN","MIL","ATL","NYK","MIA","BOS","IND","WAS","CHI","TOR","CLE","ORL","DET","CHA")~"east"), # divide into east and west conference
    fg3a_p = round(fg3a/fga, digits = 3),
    fg3_r = round(fg3m/fg3a, digits = 3)
    ) %>% 
  relocate(season_year, team_abbreviation, conference)

conf_rank = 
  avg_df %>% 
  filter(season_year %in% c("2011-12", "2012-13", "2013-14", "2014-15", "2015-16", "2016-17", "2017-18", "2018-19", "2019-20", "2020-21")) %>% 
  ungroup() %>% 
  select(season_year, team_abbreviation, conference, conf_rank)


#join the two table together
box_score_viz = 
  box_score_viz %>% 
  left_join(conf_rank, by = c("season_year", "team_abbreviation", "conference")) %>% 
  mutate(play_off_team = case_when(
           conf_rank <= 8 ~ "playoff", 
           conf_rank > 8 ~ "non-playoff"
         ), 
         play_off_team = fct_relevel(play_off_team, c("playoff", "non-playoff")), 
         fg3p = fg3m / fg3a) %>% 
  relocate(season_year, team_abbreviation, conference, play_off_team)

Data Frame for Logistic Regression

This dataframe contains stats per game, which is used for logistic regression. We exclude useless variables here and change win or lose to a factor variable.

regre_df = 
  box_score_all %>%
  select(-c(1:7)) %>%
  select(-ends_with("rank")) %>%
  mutate(wl = recode(wl, "W" = 1, "L" = 0),
         wl = as.factor(wl)) 

Data Description

We are going to do two regression model with the data above. One is to fit the number of wins of a season with average performance. The other is to fit win or loss of a single game with the data in a single game.

1.Predict the number of wins

Dependent variable is the number of wins by team and season, denoted by wins_revised.

Independent variables are selected from both offensive aspect and defensive part.

The typical attributes of “small ball era” is more three points shooting and quicker speed. So we select the following variables representing offensive level of a team:

  • fg3_p: proportion of three points shooting
  • fg3_r: three points shooting rate
  • pts_avg: average points per game
  • tov_avg: average number of turnovers per game
  • ast_avg: average number of assists per game
  • poss_trans: average number of transitions
  • passes_made: average number of passes per game
  • poss_iso: average number of isolations per game
  • poss_pr: average number of pick and rolls

As for the defensive level, variables include:

  • stl: average steals per game
  • blk: average blocks per game
  • dreb: average defensive rebounds per game

2.Variables to predict the win or lose of a single game can be viewed here

Exploratory Analysis

In this part, we explore that on which variables, there would be difference between teams that get into play-off season and teams that not. In this way, we can get some insight on choosing potential parameters for model building. Specifically, we identify the trend in three point attempt by time in the past 10 seasons

Find the distribution of “threshold”

Recall “threshold”: the number of games won by the 8th team in both west and east conference.

Plot the threshold over last 20 years.

eighth_wins =
  avg_df %>% 
  filter(conf_rank == 8)


ggplot(data = eighth_wins,aes(x = wins_revised)) + 
  geom_bar(fill = "blue", alpha = 0.5) + 
  theme_bw() 

There are 40 observations of threshold in the past 20 years, which follows a normal distribution with mean 42.35 and variance 15.4641026.

Then we are going to deep dive the factors associated with the number of wins per season for a team and try to find significant contributors to increase the number of wins.

Difference average scores

Firstly, We wanted to look at how the scores of each play distributed in the last 10 seasons from the aspects of team which got into play-off season and team who didn’t.From the figure below.

non_play_off = 
  box_score_viz %>% 
  filter(play_off_team == "non-playoff") 

box_score_viz %>% 
  filter(play_off_team == "playoff") %>% 
  ggplot(aes(x = pts, y = season_year)) + 
  geom_density_ridges(scale = .8, alpha = .5, fill = "blue", 
                      quantile_lines = T, quantile_fun = mean) + 
  geom_density_ridges(data = non_play_off, aes(x = pts, y = season_year), 
                      scale = .8, alpha = .5, fill = "salmon", 
                      quantile_lines = T, quantile_fun = mean) + 
  scale_fill_manual(name = "Team", values = cols) + 
  xlim(65, 140) + 
  labs(x = "Scores", 
       y = "Season Year", 
       title = "Score Distribution Between Playoff and Regular Season Team")  

Two things are obvious.

Firstly, in the last 10 regular seasons, score of each play displays an increasing trend. Secondly, team who got into the playoff season have higher average scores compared to team who did not get into playoff season.

Its easy to understand the second tendency that playoff teams outscore the non-plyoff ones because higher scores let them win more. As for the rising of average score for all NBA teams, that is due to the small ball revolution, in which teams are going to speed up, get more shooting chances and increase the percentage of three points shooting.

Three Point Parameters

Next, we use the Boxscore data and team average data to deep dive potential variables that contribute to the wining of plays.

It is clear that the percentage of three point field goal attempt in all field goal attempt were increasing in the last 10 seasons, which corresponds to the phenomenon of “Small Ball Revolution” and our analysis that score of each play was increasing in last 10 regular seasons. On the other hand, team who got into playoff season have more three point shooting attempt during a game, which means the three point shooting attempt percent might be a contributor to the number of game wins.

It is also apparent that the three point shooting rate is higher among playoff teams than non-playoff teams. That is because high shooting rate corresponds to higher scores of a game. Another tendency from this plot is that the variation of three point shooting rate narrows down. That reflects the attention that teams paid to three point shooting. If players were trained more on shooting, their shooting would become more stable.

Three Field Goal Attempt Percent

plot_ly(box_score_viz, x = ~ season_year, y = ~ fg3a_p, color = ~ play_off_team, type = "box") %>% 
  layout(boxmode = "group", 
         xaxis = list(title = 'Season Year'),
         yaxis = list(title = "Three Field Goal Attempt Percent"))

Three Point shooting Rate

plot_ly(box_score_viz, x = ~ season_year, y = ~ fg3p, color = ~ play_off_team, type = "box") %>% 
  layout(boxmode = "group", 
         xaxis = list(title = 'Season Year'),
         yaxis = list(title = "Three Pointer Rate"))

Offensive Play Type

Then, we are going to explore the influence that playtypes have on the average wins. If a playtype can apparently contribute to the number of wins of a team, we would suggest Knicks to design more offense in that type.

The average isolations per game in playoff teams are almost rqual from 2013-14 seasaon to now, while the average isolations per game for non-playoff teams tended to decrease overtime. Super star group might account for this phenomenon, because super stars are able to conduct more isolation. As super stars joined the playoff team, the isolation of non-playoff teams decreased.

Pick and roll is a common offensive team work. We can see from Pick and Roll plot that the average pick and rolls per game tended to increase in the last 8 years, and that of playoff teams was lower than that of non-playoff teams, which matched the phenomenon of isolation a lot.

Transition means the defensive team immediately launches a fast break after getting the rebound or stealing the ball without waiting for the new defensive team to be seated. It is an important way to speed up and score easily and quickly. Average transitions rose gradually because it is more efficient. And non-playoff teams seemed to conduct more transitions than playoff teams. But that didn’t mean more transitions less wins, instead it was likely that due to the team was non-playoff team, it has lower power in seated offense thus they tended to do more transitions.

Isolation

play_tp_df %>% 
  group_by(season_year, play_off_team) %>% 
  summarise(iso_mean = mean(poss_iso)) %>% 
  mutate(text_label = str_c("Team Type: ", play_off_team, 
                            "\nAverage Isolation: ", round(iso_mean, 2))) %>% 
  plot_ly(x = ~ season_year, y = ~ iso_mean, type = "bar",
    color = ~ play_off_team, text = ~text_label, colors = "viridis") %>% 
  layout(barmode = "group", 
         xaxis = list(title = 'Season Year'),
         yaxis = list(title = "Average Isolation"))

Pick and Roll

play_tp_df %>% 
  group_by(season_year, play_off_team) %>% 
  summarise(pr_mean = mean(poss_pr)) %>% 
  mutate(text_label = str_c("Team Type: ", play_off_team, 
                            "\nAverage Pick and Roll: ", round(pr_mean, 2))) %>% 
  plot_ly(x = ~ season_year, y = ~ pr_mean, type = "bar",
    color = ~ play_off_team, text = ~text_label, colors = "viridis") %>% 
  layout(barmode = "group", 
         xaxis = list(title = 'Season Year'),
         yaxis = list(title = "Average Pick and Roll"))

Transition

play_tp_df %>% 
  group_by(season_year, play_off_team) %>% 
  summarise(trans_mean = mean(poss_trans)) %>% 
  mutate(text_label = str_c("Team Type: ", play_off_team, 
                            "\nAverage Transition: ", round(trans_mean, 2))) %>% 
  plot_ly(x = ~ season_year, y = ~ trans_mean, type = "bar",
    color = ~ play_off_team, text = ~text_label, colors = "viridis") %>% 
  layout(barmode = "group", 
         xaxis = list(title = 'Season Year'),
         yaxis = list(title = "Average Transition"))

Season Average Movement Parameters

Block is a key defense parameter. Higher blocks mean that your opponents have lower chance to score on you. Playoff teams played better on blocks than non-playoff teams.

Steal is also a defense parameter, which is accompanied by turnovers of opponents. There was no apparent tendency in steal over time.

Too many turnovers would let a team lose a game. The turnover plot shows that the average turnovers in playoff teams were lower than the average turnovers in non-playoff teams.

Defensive rebounds could prevent the opponent’s second attack so that reduce its scores. As we can see, palyoff teams could grab more defensive rebounds than non-playoff teams.

The number of passing per game reflect the offense fluency. Adequate number of passes could bring create good shooting opportunities, but no good shooting opportunity created after too many passes represents bad offense ability. From the passes plot, non-playoff teams had higher average passes per game than playoff teams.

Block

avg_viz_df %>% 
  group_by(season_year, play_off_team) %>% 
  summarise(blk_mean = mean(blk)) %>% 
  mutate(text_label = str_c("Team Type: ", play_off_team, 
                            "\nAverage Block: ", round(blk_mean, 2))) %>% 
  plot_ly(x = ~ season_year, y = ~ blk_mean, type = "bar",
    color = ~ play_off_team, text = ~text_label, colors = "viridis") %>% 
  layout(barmode = "group", 
         xaxis = list(title = 'Season Year'),
         yaxis = list(title = "Average Steal"))

Steal

avg_viz_df %>% 
  group_by(season_year, play_off_team) %>% 
  summarise(stl_mean = mean(stl)) %>% 
  mutate(text_label = str_c("Team Type: ", play_off_team, 
                            "\nAverage Steal: ", round(stl_mean, 2))) %>% 
  plot_ly(x = ~ season_year, y = ~ stl_mean, type = "bar",
    color = ~ play_off_team, text = ~text_label, colors = "viridis") %>% 
  layout(barmode = "group", 
         xaxis = list(title = 'Season Year'),
         yaxis = list(title = "Average Steal"))

Turnover

avg_viz_df %>% 
  group_by(season_year, play_off_team) %>% 
  summarise(tov_mean = mean(tov_avg)) %>% 
  mutate(text_label = str_c("Team Type: ", play_off_team, 
                            "\nAverage Turnover: ", round(tov_mean, 2))) %>% 
  plot_ly(x = ~ season_year, y = ~ tov_mean, type = "bar",
    color = ~ play_off_team, text = ~text_label, colors = "viridis") %>% 
  layout(barmode = "group", 
         xaxis = list(title = 'Season Year'),
         yaxis = list(title = "Average Turnover"))

Defensive Rebound

avg_viz_df %>% 
  group_by(season_year, play_off_team) %>% 
  summarise(dreb_mean = mean(dreb)) %>% 
  mutate(text_label = str_c("Team Type: ", play_off_team, 
                            "\nAverage Defensive Rebound: ", round(dreb_mean, 2))) %>% 
  plot_ly(x = ~ season_year, y = ~ dreb_mean, type = "bar",
    color = ~ play_off_team, text = ~text_label, colors = "viridis") %>% 
  layout(barmode = "group", 
         xaxis = list(title = 'Season Year'),
         yaxis = list(title = "Average Turnover"))

Passes

avg_viz_df %>% 
  group_by(season_year, play_off_team) %>% 
  summarise(passes_mean = mean(passes_made)) %>% 
  mutate(text_label = str_c("Team Type: ", play_off_team, 
                            "\nAverage Passes: ", round(passes_mean, 2))) %>% 
  plot_ly(x = ~ season_year, y = ~ passes_mean, type = "bar",
    color = ~ play_off_team, text = ~text_label, colors = "viridis") %>% 
  layout(barmode = "group", 
         xaxis = list(title = 'Season Year'),
         yaxis = list(title = "Average Passes"))

Exploratory Analysis

In this part, we explore each game in the past 20 years, try to find some important variables that might affect the result of the game. By this process, we also can get some insight on choosing potential parameters for model building.

Difference average scores

Firstly, we take a look on the scores difference between the winning team and losing team in the past 20 years.

We can see that if a team wants to win the game, the score they needs to achieve become much higher compared to the past years. The average score for the winning team has some up and down form 2001-2015 seasons, however, after entering the small ball revolution, the average score for winning keep increase and never fall down since 2015-16 season.

So, it is obviously that if a team wants to win a game, they need to find a new techniques to earn more score. Next we will explore some factors we think might play a rule on the result of the game.

lose_game=
  box_score_all %>% 
  filter(wl == "L") 

  box_score_all %>% 
  filter(wl == "W") %>% 
  ggplot(aes(x = pts, y = season_year)) + 
  geom_density_ridges(scale = .8, alpha = .5, fill = "blue", 
                      quantile_lines = T, quantile_fun = mean) + 
  geom_density_ridges(data = lose_game, aes(x = pts, y = season_year), 
                      scale = .8, alpha = .5, fill = "salmon", 
                      quantile_lines = T, quantile_fun = mean) + 
  scale_fill_manual(name = "Team", values = cols) + 
  xlim(65, 140) + 
  labs(x = "Scores", 
       y = "Season Year", 
       title = "Score Distribution Win and Lose game")  

Variables can affect the result of game

As a team needs to gain more score for winning the game, to analyze the factors of game result, we first look at some variables that directly have influence on score. We put both the plot of percentage and attempted together, so we can observe NBA’s trend of scoring strategy in these 20 years.

First, we can see that although the field goal attempted and percentage didn’t seems to have much change through these 20 years, the winning team have much stronger field goal percentage compare the team who lose. Also, we can see that the losing team have a slightly more field goal attempted than the winning team.

Secondly, we can see that the 3 point field goals’s percentage didn’t change much in these two decades.However, there is a really significant increase on the 3 point field goals attempted. After the small ball era at 2015, the 3 point field goals attempted grow up remarkably, also we can see that same pattern as field goal attempted, the losing team also have higher 3 point field goals attempted.

Third, we can see that free throw attempted and percentage didn’t have much change through these 20 years. The winning team have higher attempted and percentage.

By inspect these variables, we can conclude that on the basketball field there are almost like a three points field goals fight after 2015, everyone throw as much as 3 points play as they can. The most notable difference of attempted between winning team and losing team happened on the free throw attempted, this means that even that the free throw only contribute one point in the score, it still is a indicator of the game result.And the most outstanding difference of percentage between winning team and losing team happened on the field goal percentage, this suggest that the to improve field goal percentage is one of the most critcal thing a team should consider if they want to win the game.

So since 3 points field attempted become trend in every game, we look up three of the most higher 3 points field attempted out liner on the plot. We find out it all made by Houston Rockets, so we took a look deeply, we find out in the top ten of the highest higher 3 points field in these 20 years, Houston Rockets occupy 8 of it and the other 2 is Atlanta Hawks.

Field Goal

plot_ly( box_score_all, x = ~ season_year, y = ~ fga , color = ~ wl, type = "box") %>% 
  layout(boxmode = "group", 
         xaxis = list(title = 'Season Year'),
         yaxis = list(title = "Field Goal Attempted"))
plot_ly( box_score_all, x = ~ season_year, y = ~ fg_pct, color = ~ wl, type = "box") %>% 
  layout(boxmode = "group", 
         xaxis = list(title = 'Season Year'),
         yaxis = list(title = "Field Goal Percentage"))

3 Point Field Goals

plot_ly(box_score_all, x = ~ season_year, y = ~ fg3a, color = ~ wl, type = "box") %>% 
  layout(boxmode = "group", 
         xaxis = list(title = 'Season Year'),
         yaxis = list(title = "3 Point Field Goals Attempted"))
plot_ly(box_score_all, x = ~ season_year, y = ~ fg3_pct, color = ~ wl, type = "box") %>% 
  layout(boxmode = "group", 
         xaxis = list(title = 'Season Year'),
         yaxis = list(title = "3 Point Field Goals Percentage"))

Free Throw

plot_ly(box_score_all, x = ~ season_year, y = ~ fta, color = ~ wl, type = "box") %>% 
  layout(boxmode = "group", 
         xaxis = list(title = 'Season Year'),
         yaxis = list(title = "Free Throw Attempted"))
plot_ly(box_score_all, x = ~ season_year, y = ~ ft_pct, color = ~ wl, type = "box") %>% 
  layout(boxmode = "group", 
         xaxis = list(title = 'Season Year'),
         yaxis = list(title = "Free Throw Percent"))

New York Knicks 3 Point Field Goals

nyk= 
  box_score_all %>% 
  filter(team_abbreviation =="NYK")

3 Point Field Goals Attempted

plot_ly(nyk, x = ~ season_year, y = ~ fg3a, color = ~ wl, type = "box") %>% 
  layout(boxmode = "group", 
         xaxis = list(title = 'Season Year'),
         yaxis = list(title = "3 Point Field Goals Attempted"))

3 Point Field Goals Percentage

plot_ly(nyk, x = ~ season_year, y = ~ fg3_pct, color = ~ wl, type = "box") %>% 
  layout(boxmode = "group", 
         xaxis = list(title = 'Season Year'),
         yaxis = list(title = "3 Point Field Goals Percentage"))

Offensive Level Parameters

Next, we are going to explore the influence of some offensive strategies on the basketball field, to see what kind of techniques might play a role on the result of the game.

The average offensive rebounds is slightly higher in the losing team. And in the average assists per game the winning team is significantly higher.

Average offensive rebounds are higher in the losing team seems like a same pattern as the attempted also higher in the losing team. We can hypothesize that when a team started to lose they will take more aggressive strategies compared to the team who keep leading .

Average assists per games is significantly higher in the winning team might result from the assists is defined as scoring successfully, and more scoring means more possible to win.

Average offensive rebounds per game

offensive_df %>% 
  group_by(season_year, wl) %>% 
  summarise(oreb= mean(oreb)) %>% 
  plot_ly(x = ~ season_year, y = ~ oreb, type = "bar",
    color = ~wl, colors = "viridis") %>% 
  layout(barmode = "group", 
         xaxis = list(title = 'Season Year'),
         yaxis = list(title = "Average offensive rebounds of each game"))

Aaverage assists per games

offensive_df %>% 
  group_by(season_year, wl) %>% 
  summarise(ast= mean(ast)) %>% 
  plot_ly(x = ~ season_year, y = ~ ast, type = "bar",
    color = ~wl, colors = "viridis") %>% 
  layout(barmode = "group", 
         xaxis = list(title = 'Season Year'),
         yaxis = list(title = "Aaverage assists of each game"))

Defensive level Parameters

In this part, we are going to explore the influence of some defensive level strategies on the basketball field, to see what kind of defensive techniques might play a role on the result of the game.

Steals, Blocks, Defensive rebounds of each game is significantly higher in the winning team, Personal foul and Turnovers of each game are slightly higher in the losig team.

Steals of each game

box_score_all %>% 
  group_by(season_year, wl) %>% 
  summarise(stl= mean(stl)) %>% 
  plot_ly(x = ~ season_year, y = ~ stl, type = "bar",
    color = ~wl, colors = "viridis") %>% 
  layout(barmode = "group", 
         xaxis = list(title = 'Season Year'),
         yaxis = list(title = "Aaverage steals of each game"))

Blocks of each game

box_score_all %>% 
  group_by(season_year, wl) %>% 
  summarise(blk= mean(blk)) %>% 
  plot_ly(x = ~ season_year, y = ~ blk, type = "bar",
    color = ~wl, colors = "viridis") %>% 
  layout(barmode = "group", 
         xaxis = list(title = 'Season Year'),
         yaxis = list(title = "Aaverage blocks of each game"))

Defensive rebounds of each game

box_score_all %>% 
  group_by(season_year, wl) %>% 
  summarise(dreb= mean(dreb)) %>% 
  plot_ly(x = ~ season_year, y = ~ dreb, type = "bar",
    color = ~wl, colors = "viridis") %>% 
  layout(barmode = "group", 
         xaxis = list(title = 'Season Year'),
         yaxis = list(title = "Aaverage defensive rebounds of each game"))

Turnovers of each game

box_score_all %>% 
  group_by(season_year, wl) %>% 
  summarise(tov= mean(tov)) %>% 
  plot_ly(x = ~ season_year, y = ~ tov, type = "bar",
    color = ~wl, colors = "viridis") %>% 
  layout(barmode = "group", 
         xaxis = list(title = 'Season Year'),
         yaxis = list(title = "Aaverage turnovers of each game"))

Personal foul of each game

box_score_all %>% 
  group_by(season_year, wl) %>% 
  summarise(pf= mean(pf)) %>% 
  plot_ly(x = ~ season_year, y = ~ pf, type = "bar",
    color = ~wl, colors = "viridis") %>% 
  layout(barmode = "group", 
         xaxis = list(title = 'Season Year'),
         yaxis = list(title = "Aaverage personal foul of each game"))

Regression

In this part, we are going to use linear model to quantify the relationship between the number of wins and average performance parameters. Further, to know the factors influencing the result of a single game, we also use logistic regression to fit the box score data.

MLR exploration

predict_df %>% 
  select(-fg3a_total, -fg3m_total, -play_off_team, -conf_rank) %>% 
  ggpairs(columns = 4:16)

The correlation between predictors are not very high, which is important for preventing collinearity.

We used backward elimination method to select the significant dependents.

Firstly, put all the potential variables into the linear model, to see the regression results of model1. The adjusted R square for the full model is 0.5767 that is to say 57.67% of variances in the response variable can be explained by the predictors.

model1 = lm(data = predict_df, wins_revised ~ pts_avg + tov_avg + fg3_p + fg3_r + stl + blk + dreb + poss_trans + poss_iso + poss_pr + ast_avg + passes_made)

Then, to get a better model with higher adjusted R square, we delete the less effective predictors with higher p-value, which is passes_made. The adjusted R square got improved to 0.5777. 57.77% of variances in the response variable can be explained by the predictors.

model2 = lm(data = predict_df, wins_revised ~ pts_avg + tov_avg + fg3_p + fg3_r + stl + blk + dreb + poss_trans + poss_iso + poss_pr + ast_avg)

Then delete the ast_avg which has the highest p-value among the variables left to see if the adjusted R square could be better.

model3 = lm(data = predict_df, wins_revised ~ pts_avg + tov_avg + fg3_p + fg3_r + stl + blk + dreb + poss_trans + poss_iso + poss_pr)
model3 %>% broom::tidy() %>% knitr::kable()
term estimate std.error statistic p.value
(Intercept) -185.1802187 21.7897670 -8.498495 0.0000000
pts_avg 0.3251935 0.1946784 1.670414 0.0962036
tov_avg -2.0555801 0.5646676 -3.640337 0.0003366
fg3_p -22.3063836 11.5346505 -1.933859 0.0543629
fg3_r 270.4523865 37.1830136 7.273547 0.0000000
stl 5.6946036 0.8515696 6.687185 0.0000000
blk 0.9850197 0.8078006 1.219385 0.2239526
dreb 3.3423210 0.4578976 7.299276 0.0000000
poss_trans -1.1493701 0.3017334 -3.809223 0.0001791
poss_iso 0.6970955 0.2287848 3.046949 0.0025830
poss_pr -0.6435787 0.1555634 -4.137083 0.0000494

The adjusted R square got decreased to 0.5768, but the difference is little.
The next predictor with the highest p-value is blk, which is quite significant in baskketball game. So We will not exclude it from our model.

Cross Validation

With respect to the above three models, we want to see which model has the best generalizability. So in this section, cross validation is used to compare candidate model.

set.seed(1000)

predict_cv_df = 
  predict_df %>% 
  crossv_mc(100) %>% 
  mutate(train = map(train, as.tibble), 
         test = map(test, as.tibble))

predict_cv_df = 
  predict_cv_df %>% 
  mutate(model1 = map(train, ~lm(wins_revised ~ pts_avg + ast_avg + tov_avg + fg3_p + fg3_r + stl + blk + dreb + poss_trans + passes_made + poss_iso + poss_pr, data = .x)),  
         model2 = map(train, ~lm(wins_revised ~ pts_avg + ast_avg + tov_avg + fg3_p + fg3_r + stl + blk + dreb + poss_trans + poss_iso + poss_pr, data = .x)), 
         model3 = map(train, ~lm(wins_revised ~ pts_avg + tov_avg + fg3_p + fg3_r + stl + blk + dreb + poss_trans + poss_iso + poss_pr, data = .x))) %>% 
  mutate(rmse1 = map2_dbl(model1, test, ~rmse(model = .x, data = .y)), 
         rmse2 = map2_dbl(model2, test, ~rmse(model = .x, data = .y)), 
         rmse3 = map2_dbl(model3, test, ~rmse(model = .x, data = .y)))
  

predict_cv_df %>% 
  select(starts_with("rmse")) %>% 
  pivot_longer(everything(), 
               names_to = "model", 
               names_prefix = "rmse", 
               values_to = "rmse") %>% 
  ggplot(aes(x = model, y = rmse, fill = model)) + 
  geom_boxplot(alpha = .6)

The results rmse distribution of the three model are very similar to each other, which indicates similar level of generalizability. Therefore, according to parsimony, we used model3 as our final model.

Modeling

The predicting model for the number of games won by a team is:

wins_revised = -185 + 0.33pts_avg – 2.06tov_avg – 22.3fg3_p + 270fg3_r + 5.69stl + 0.99blk + 3.34dreb – 1.15poss_trans + 0.7poss_iso – 0.64poss_pr

Model diagnostic

We can see that Residuals vs Fitted is approximately normally distributed around 0. On the other hand, heteroscedasticity is not a problem in this model. And there is no outlier that have big impact on the model fit.

par(mfrow = c(2,2))
plot(model2, which = 1)
plot(model2, which = 2)
plot(model2, which = 3)
plot(model2, which = 4)

Interpretation of model coefficients

All variables selected are significant in this linear regression model.
Considering the unit of the predictors, it’s not easy for some variables to get improved a lot. Instead, just normal fluctuation of tov_avg, fg3_r, and dreb could influence the results.

For each 1 additional average turnover, there will be 2 extra lose, which hurts badly.
For each 1% additional increase in three points shooting rate, there will be 2.8 extra wins, which helps the team a lot.
For each 1 additional average defensive rebound, there will be 3.31 extra wins! Protecting the defensive rebound well is critical.

Predictions on Knicks

Based on the boxscores of this season, we use model 2 to predict number of winnings for existing 30 teams. By arranging the predicted number of winnings, the Knicks is predicted to have 43.6 winnings this season and rank 8 in the season of 2021-22. According to this result, if the Knicks wants to secure a space for playoff season, it has to improve its performance and tries to win more.

top8_east = 
  prediction_21_22 %>% 
  head(8) %>% 
  left_join(new_season_df, by = c("season_year", "team_abbreviation", "conference")) %>% 
  group_by(season_year) %>% 
  mutate(ranking = row_number())

top8_east %>% 
  select(season_year, team_abbreviation, conference, ranking) %>% 
  knitr::kable("simple")
season_year team_abbreviation conference ranking
2021-22 CHA east 1
2021-22 MIL east 2
2021-22 PHI east 3
2021-22 BKN east 4
2021-22 BOS east 5
2021-22 ATL east 6
2021-22 CHI east 7
2021-22 NYK east 8

Gaps in Performance

Let’s see what Knicks can do to improve its performance and rush into playoff season.

From the plots below, we can conclude that Knicks has to improve its performance in the aspects of turnover and three points. As the model shows, the number of turnover is negatively associated with the number of winning. However, Knicks currently has the second highest number of turnover per game. In addition, its high percentage of three field goal attempt and low three pointer rate among the top 8 of east conference prevent it from getting a good prediction result.

Turnover

top8_east %>% 
  ggplot(aes(x = reorder(team_abbreviation, tov_avg), y = tov_avg, fill = team_abbreviation)) + 
  geom_bar(stat = "identity") + 
  labs(
    x = "Team", 
    y = "Average Turnover per Game", 
    title = "Top 8 Team Average Turnover (East)"
  )

3 Field Goal Attempt Percentage

top8_east %>% 
  ggplot(aes(x = reorder(team_abbreviation, fg3_p), y = fg3_p, fill = team_abbreviation)) + 
  geom_bar(stat = "identity") + 
  labs(
    x = "Team",
    y = "Three Pointer Rate",
    title = "Top 8 Team Three Field Goal Attempt (East)"
  )

3 Pointer Rate

top8_east %>% 
  ggplot(aes(x = reorder(team_abbreviation, fg3_r), y = fg3_r, fill = team_abbreviation)) + 
  geom_bar(stat = "identity") + 
  labs(
    x = "Team",
    y = "Three Pointer Rate",
    title = "Top 8 Team Three Pointer Rate (East)"
  )

Steal

top8_east %>% 
  ggplot(aes(x = reorder(team_abbreviation, stl), y = stl, fill = team_abbreviation)) + 
  geom_bar(stat = "identity") + 
  labs(
    x = "Team", 
    y = "Average Steal per Game"
  )

Defensive Rebound

top8_east %>% 
  ggplot(aes(x = reorder(team_abbreviation, dreb), y = dreb, fill = team_abbreviation)) + 
  geom_bar(stat = "identity") + 
  labs(
    x = "Team", 
    y = "Average Defensive Rebound"
  )

Isolation

top8_east %>% 
  ggplot(aes(x = reorder(team_abbreviation, poss_iso), y = poss_iso, fill = team_abbreviation)) + 
  geom_bar(stat = "identity") + 
  labs(x = "Team", 
       y = "Average Isolation")

Pick and Roll

top8_east %>% 
  ggplot(aes(x = reorder(team_abbreviation, poss_pr), y = poss_pr, fill = team_abbreviation)) + 
  geom_bar(stat ="identity") + 
  labs(x = "Team", 
       y = "Average Pick and Roll")

Zoom in with Three Pointer - Knicks

As three pointer is such a crucial parameter for NBA team to get into playoff season. We decide to look at Knicks’ three pointer shooting data to offer more specific suggestions. In this section, we will compare the overall performance of Knicks to league average, and then, draw the plots for some three pointer team leaders.

In this part, we referred ballr package and Owen’s blog to make the hex plots

library(prismatic)
library(extrafont)
library(cowplot)

p = plot_court(court_themes$light) +
  geom_polygon(
    data = df,
    aes(
      x = adj_x,
      y = adj_y,
      group = hexbin_id, 
      fill = league_avg_diff, 
      color = after_scale(clr_darken(fill, .333))),
    size = .25) + 
  scale_x_continuous(limits = c(-27.5, 27.5)) + 
  scale_y_continuous(limits = c(0, 45)) +
  scale_fill_distiller(direction = -1, 
                       palette = "PuOr", 
                       limits = c(-.15, .15), 
                       breaks = seq(-.15, .15, .03),
                       labels = c("-15%", "-12%", "-9%", "-6%", "-3%", "0%", "+3%", "+6%", "+9%", "+12%", "+15%"),
                       "3FG Percentage Points vs. League Average") +
  guides(fill = guide_legend(
    label.position = 'bottom', 
    title.position = 'top', 
    keywidth = .45,
    keyheight = .15, 
    default.unit = "inch", 
    title.hjust = .5,
    title.vjust = 0,
    label.vjust = 3,
    nrow = 1))  +
  theme(legend.spacing.x = unit(0, 'cm'), 
        legend.title = element_text(size = 9), 
        legend.text = element_text(size = 8), 
        legend.margin = margin(-10,0,-1,0),
        legend.position = 'bottom',
        legend.box.margin = margin(-30,0,15,0), 
        plot.title = element_text(hjust = 0.5, vjust = -1, size = 15),
        plot.subtitle = element_text(hjust = 0.5, size = 8, vjust = -.5), 
        plot.caption = element_text(face = "italic", size = 8), 
        plot.margin = margin(0, -5, 0, -5, "cm")) +
  labs(title = "New York Knicks - Three Point",
       subtitle = "2021-22 Regular Season")

ggdraw(p) + 
  theme(plot.background = element_rect(fill="floralwhite", color = NA))

According to this hex plot, in the 2021-2022 regular season, The Knicks has a better performance in three-point field goal at both of the wing area compared to league average. And it has a equal performance with the league average at the head of the key area. However, the Knicks performs worse at the both corner area compared to the league average. To be more specific, the three pointer percentage at right wing is 6% higher than the league average. At left wing, the three pointer percentage is 3% higher than the league average. On the other hand, the team’s three pointer percentage is 6% and 3% lower than the league average in the right and left corners respectively. Therefore, the Knicks should deploy more three field goal tactics at left and right wing areas. And the shooting ability at corner area should be further strengthened through training.

Zoom in with Three Pointer - Knicks Team Leaders

As to further understand what tactic the Knicks can deploy, we decide to look at the shooting log of three pointer team leaders in Knicks, including Alec Burks, Kemba Walker and Derrick Rose, who have the highest three pointer rate in Knicks.

Alec’s Shooting

alec_p = 
  plot_court(court_themes$light) +
  geom_polygon(
    data = alec_df,
    aes(
      x = hex_data.adj_x,
      y = hex_data.adj_y,
      group = hex_data.hexbin_id, 
      fill = hex_data.league_avg_diff, 
      color = after_scale(clr_darken(fill, .333))),
    size = .25) + 
  scale_x_continuous(limits = c(-27.5, 27.5)) + 
  scale_y_continuous(limits = c(0, 45)) +
  scale_fill_distiller(direction = -1, 
                       palette = "PuOr", 
                       limits = c(-.15, .15), 
                       breaks = seq(-.15, .15, .03),
                       labels = c("-15%", "-12%", "-9%", "-6%", "-3%", "0%", "+3%", "+6%", "+9%", "+12%", "+15%"),
                       "3FG Percentage Points vs. League Average") +
  guides(fill = guide_legend(
    label.position = 'bottom', 
    title.position = 'top', 
    keywidth = .45,
    keyheight = .15, 
    default.unit = "inch", 
    title.hjust = .5,
    title.vjust = 0,
    label.vjust = 3,
    nrow = 1))  +
  theme(legend.spacing.x = unit(0, 'cm'), 
        legend.title = element_text(size = 9), 
        legend.text = element_text(size = 8), 
        legend.margin = margin(-10,0,-1,0),
        legend.position = 'bottom',
        legend.box.margin = margin(-30,0,15,0), 
        plot.title = element_text(hjust = 0.5, vjust = -1, size = 15),
        plot.subtitle = element_text(hjust = 0.5, size = 8, vjust = -.5), 
        plot.caption = element_text(face = "italic", size = 8), 
        plot.margin = margin(0, -5, 0, -5, "cm")) +
  labs(title = "Alec Burks - Three Point",
       subtitle = "2021-22 Regular Season")

ggdraw(alec_p) + 
  theme(plot.background = element_rect(fill="floralwhite", color = NA))

Walker’s Shooting

walker_p = 
  plot_court(court_themes$light) +
  geom_polygon(
    data = walker_df,
    aes(
      x = hex_data.adj_x,
      y = hex_data.adj_y,
      group = hex_data.hexbin_id, 
      fill = hex_data.league_avg_diff, 
      color = after_scale(clr_darken(fill, .333))),
    size = .25) + 
  scale_x_continuous(limits = c(-27.5, 27.5)) + 
  scale_y_continuous(limits = c(0, 45)) +
  scale_fill_distiller(direction = -1, 
                       palette = "PuOr", 
                       limits = c(-.15, .15), 
                       breaks = seq(-.15, .15, .03),
                       labels = c("-15%", "-12%", "-9%", "-6%", "-3%", "0%", "+3%", "+6%", "+9%", "+12%", "+15%"),
                       "3FG Percentage Points vs. League Average") +
  guides(fill = guide_legend(
    label.position = 'bottom', 
    title.position = 'top', 
    keywidth = .45,
    keyheight = .15, 
    default.unit = "inch", 
    title.hjust = .5,
    title.vjust = 0,
    label.vjust = 3,
    nrow = 1))  +
  theme(legend.spacing.x = unit(0, 'cm'), 
        legend.title = element_text(size = 9), 
        legend.text = element_text(size = 8), 
        legend.margin = margin(-10,0,-1,0),
        legend.position = 'bottom',
        legend.box.margin = margin(-30,0,15,0), 
        plot.title = element_text(hjust = 0.5, vjust = -1, size = 15),
        plot.subtitle = element_text(hjust = 0.5, size = 8, vjust = -.5), 
        plot.caption = element_text(face = "italic", size = 8), 
        plot.margin = margin(0, -5, 0, -5, "cm")) +
  labs(title = "Kemba Walker - Three Point",
       subtitle = "2021-22 Regular Season")

ggdraw(walker_p) + 
  theme(plot.background = element_rect(fill="floralwhite", color = NA))

Rose’s Shooting

rose_p = 
  plot_court(court_themes$light) +
  geom_polygon(
    data = rose_df,
    aes(
      x = hex_data.adj_x,
      y = hex_data.adj_y,
      group = hex_data.hexbin_id, 
      fill = hex_data.league_avg_diff, 
      color = after_scale(clr_darken(fill, .333))),
    size = .25) + 
  scale_x_continuous(limits = c(-27.5, 27.5)) + 
  scale_y_continuous(limits = c(0, 45)) +
  scale_fill_distiller(direction = -1, 
                       palette = "PuOr", 
                       limits = c(-.15, .15), 
                       breaks = seq(-.15, .15, .03),
                       labels = c("-15%", "-12%", "-9%", "-6%", "-3%", "0%", "+3%", "+6%", "+9%", "+12%", "+15%"),
                       "3FG Percentage Points vs. League Average") +
  guides(fill = guide_legend(
    label.position = 'bottom', 
    title.position = 'top', 
    keywidth = .45,
    keyheight = .15, 
    default.unit = "inch", 
    title.hjust = .5,
    title.vjust = 0,
    label.vjust = 3,
    nrow = 1))  +
  theme(legend.spacing.x = unit(0, 'cm'), 
        legend.title = element_text(size = 9), 
        legend.text = element_text(size = 8), 
        legend.margin = margin(-10,0,-1,0),
        legend.position = 'bottom',
        legend.box.margin = margin(-30,0,15,0), 
        plot.title = element_text(hjust = 0.5, vjust = -1, size = 15),
        plot.subtitle = element_text(hjust = 0.5, size = 8, vjust = -.5), 
        plot.caption = element_text(face = "italic", size = 8), 
        plot.margin = margin(0, -5, 0, -5, "cm")) +
  labs(title = "Derrick Rose - Three Point",
       subtitle = "2021-22 Regular Season")

ggdraw(rose_p) + 
  theme(plot.background = element_rect(fill="floralwhite", color = NA))

 The performance of Knicks team leaders in three pointer is in accordance with the performance of the whole team. In the 2021-22 season, none of the team leaders performs better than the league average in both of the corner area.However, it is more likely for them to make three point at both wings. Therefore, we think the coach should deploy more tactics for the team leaders at wing area. And the players should make less shot attempt in a play at the corner but get more training in shooting at this area.

Shiny App

With all the data analysis and model exploration, we identified features that contribute to the number of winning in a season, game results and play scores. We think it’s necessary for the fans to see how these feature changes in different seasons for different NBA teams, moreover, make comparisons between teams. Therefore, we created an interactive dashboard in Shiny app to achieve this purpose.

In this Shiny App, users can select different feature and team(s) to visualize the team(s) performance in this feature, make comparisons and get ranking predictions for the team(s) in the season of 2021-22.

Conclusion and Discussion

Firstly, in order to enter playoff, Knicks is supposed to win about 43 games in a regular season.

Then, variables positively influencing the number of winnings a lot are mainly three points shooting rate and the average number of defense rebounds. variable negatively influencing the number o winnings a lot is the average number of turnovers.

What’s more, New York Knicks ranked second among top eight teams in the number of turnovers per game and fifth among top eight teams in the three point shooting rate. Both of the two stats are the shortage of New York Knicks. Specifically, the top three points shooting rate is slightly lower than the league average, and the corner three shooting rate is highly lower than the league average. Leading players like Alec Burks, Kemba Walker, and Derick Rose account for the low three point shooting rate.

Finally, to get more wins, we highly suggest Knicks taking advantage of the hot area, deploying more tactics in for wing three shooting. And in the training, players should practice more on corner threes. Besides, more team training is needed to avoid turnovers.